-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[COLLECTIONS-843] Implement Layered Bloom filter #402
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #402 +/- ##
============================================
+ Coverage 81.29% 81.57% +0.28%
- Complexity 4640 4742 +102
============================================
Files 290 295 +5
Lines 13545 13751 +206
Branches 2003 2022 +19
============================================
+ Hits 11011 11217 +206
Misses 1933 1933
Partials 601 601 ☔ View full report in Codecov by Sentry. |
Hi @Claudenw PS: You can detect all failures before pushing by running the default Maven goal on the command line with a simple |
mvn doesn't catch all the issues :( I just spend a number of hours working out new tests for coverage changes. |
Gentlemen, Can I get eyes on the new code please. I think it is correct and fixes all the previously identified issues. |
@garydgregory @aherbert nudge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK here. @aherbert ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted a few changes. I did not look at the test code. I will do so in the next few days.
src/main/java/org/apache/commons/collections4/bloomfilter/BloomFilter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/bloomfilter/BloomFilterProducer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/bloomfilter/BloomFilterProducer.java
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/bloomfilter/BloomFilterProducer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/bloomfilter/BloomFilterProducer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/bloomfilter/LayerManager.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/bloomfilter/LayerManager.java
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/bloomfilter/LayerManager.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/bloomfilter/LayerManager.java
Outdated
Show resolved
Hide resolved
src/main/java/org/apache/commons/collections4/bloomfilter/Shape.java
Outdated
Show resolved
Hide resolved
@aherbert thank you for your review. Changes have been made to the base code. One of these days I will figure out how you see all those things that I seem to miss. |
…ons into layered_filter
Let's make sure we let GitHub squash the commits once this is approved. |
@aherbert Have you had a chance to look at the test code? I have a series of blog posts about Bloom filters coming out "soon" (depending on when my company gets around to finalizing it). The second one in the series talks about commons-collections Bloom filters. I am hoping we have this in the 4.5-SNAPSHOT before then. Thanks to you and @garydgregory for all your assistance in making the entire package a better submission. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Claudenw There are some minor things to clean up here but no major issues. Can you check the LayerManagerTest for testAdvanceOnCalculatedFull and testAdvanceOnSaturation are using the correct maxN
.
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterProducerTest.java
Show resolved
Hide resolved
assertTrue(cp.forEachRemaining()); | ||
|
||
// test last item not checked | ||
cp = new CountingPredicate<>(ary, (x, y) -> y == Integer.valueOf(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not testing what you want. You never call cp.test. So when you call cp.forEachRemaining your predicate will be called with (1, null) and then (2, null). You should check for this. But you are checking y == 2 which is not targeting any functionality of the CountingPredicate. A similar argument for the test below where you check y == 1. Since y will be null in forEachRemaining this is not testing a completion of the forEachRemaining loop.
You should add a test case that when you call cp.test(Z) that the constructor BiPredicate receives (1, Z) then (2, Z).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
SparseDefaultBloomFilter(final Shape shape) { | ||
public SparseDefaultBloomFilter(final Shape shape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These do not need to be public since they are used only in the package. If I revert this change the test suite still functions.
Are you using them externally by importing collections test-jar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is exactly the usage. In testing a new BloomFilter implementation is is handy to have a clean implementation of a sparse filter as used in the testing to verify correct functioning of the new Bloom filter.
/** | ||
* A default implementation of a non-sparse Bloom filter. | ||
*/ | ||
public static class NonSparseDefaultBloomFilter extends AbstractDefaultBloomFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to be public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
src/test/java/org/apache/commons/collections4/bloomfilter/LayerManagerTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/commons/collections4/bloomfilter/LayeredBloomFilterTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/commons/collections4/bloomfilter/LayeredBloomFilterTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/commons/collections4/bloomfilter/LayeredBloomFilterTest.java
Outdated
Show resolved
Hide resolved
@@ -41,4 +41,16 @@ public void testMergeShortBitMapProducer() { | |||
assertTrue(filter.merge(producer)); | |||
assertEquals(1, filter.cardinality()); | |||
} | |||
|
|||
@Test | |||
public void testCardinalityAndIsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just doing what the parent class does. The extra test should be in its own fixture, e.g.
@Test
public void testEmptyAfterMergeWithNothing() {
// test the case where is empty after merge
// in this case the internal cardinality == -1
BloomFilter bf = createEmptyFilter(getTestShape());
bf.merge(IndexProducer.fromIndexArray());
assertTrue(bf.isEmpty());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as fixture in abstract test class.
src/test/java/org/apache/commons/collections4/bloomfilter/WrappedBloomFilterTest.java
Outdated
Show resolved
Hide resolved
Hi All. Is this PR ready to be merged? |
There are some changes to make to the test code. I only reviewed this yesterday. |
👍 TY for the update |
@aherbert just a friendly ping. I'd like to get this closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only reviewed the changes since the last review. It looks OK. I noted a change to make the test able to support concurrent test execution.
import org.junit.jupiter.api.Test; | ||
|
||
public class CountingPredicateTest { | ||
private List<Pair<Integer, Integer>> expected = new ArrayList<>(); | ||
private List<Pair<Integer, Integer>> result = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be an argument to makeFunc
. The expected
can be created for each test that uses it.
Currently using a class level object works as JUnit is only executing one test at a time on an instance. But it is cleaner and more thread safe to pass in result
to the makeFunc
helper (which can be static).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aherbert, thanks for the suggestion. I made the changes.
@@ -28,11 +28,11 @@ | |||
import org.junit.jupiter.api.Test; | |||
|
|||
public class CountingPredicateTest { | |||
private List<Pair<Integer, Integer>> expected = new ArrayList<>(); | |||
private List<Pair<Integer, Integer>> result = new ArrayList<>(); | |||
//private List<Pair<Integer, Integer>> expected = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Claudenw I don't think we need to keep these.
I previously requested Claude to remove some commented out code in a test class. Otherwise this is fine. |
Implementation to fix COLLECTIONS-843.
This is a simple Layered Bloom filter implementation with complex configuration options.
A static method to create a fixed level implementation that does not merge is provided.
An example of a complex time based Bloom filter is provided in the testing code where filters are added to until they are saturated or one second has elapsed and are removed after they are 4 seconds old.
this change is in support of Kafka KIP-936
https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs